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Abstract of JP 11249867 (A) 

PROBLEM TO BE SOLVED: To provide a voice 
browser system which enables even a visually 
handicapped person to acquire the WWW 
information. SOLUTION: This system includes a 
server 100 that has a voice request acquisition 
means 101 which acquires a request from a client 
200 via the input of voices, a voice recognition 
means 102 which recognizes the voices of the 
request inputted from the means 101 , a request 
transmission means 103 which transmits a request 
to the URL that is designated by the client 200 
based on the recognition result of the means 102 to 
an internet 70, a voice data generation means 104 
which extracts a read-aloud text from the answer 
given from the internet 70 and converts the text into 
the voice data to synthesize the voices and a voice 
data transmission means 105 which transmits the 
voice data generated by the means 104 to the client 
200.; The system also includes the client 100 that 
has a voice input means 201 which inputs the 
requests given from the users in voices, a request 
issue means 202 which extracts the URL from the 
result acquired from the server 100 and gives a 
request of an HTML file to the server 100 based on 
the extracted URL and a voice output means 203 
which outputs the voice data received from the 
server 100. 
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JPO and INPIT are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2. **** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1]In a client/server system which consists of a computer and an internetwork, By a client 
side, are information accumulated in a server a voice browser system outputted with a sound, and 
said server, A voice demand acquisition means which acquires a demand by voice input from a client, 
A voice recognition means which carries out speech recognition of the demand with a sound inputted 
from said voice demand acquisition means, As opposed to URL specified from said client based on a 
result recognized by said voice recognition means, A demand transmitting means which transmits said 
demand to said Internet, and a voice data creating means which extracts a reading-aloud text from a 
response acquired from said Internet, changes into voice data, and synthesizes voice, Have a voice 
data transmitting means which transmits voice data generated by voice data creating means to said 
client, and said client, A voice input means which inputs a demand with a user's sound, and a demand 
issuing means which extracts URL and requires an HTML file of this server based on this URL from a 
result acquired from said server, A voice browser system having a voice output means which outputs 
voice data transmitted from said server. 

[Claim 2]The voice browser system comprising according to claim 1: 

A link-items list in which said voice recognition means consists of a candidate similar to a demand of 
voice data inputted from said client. 

A morphological-analysis means which conducts the morphological analysis of said link-items list. 



[Claim 3]The voice browser system according to claim 1 including a translating means which 
translates a response which acquired said voice data creating means from said Internet into a 
language which said client wishes. 

[Claim 4]The voice browser system according to claim 1 with which it includes a reproduced 
reproduction means when voice data which received said voice output means from said server is a 
music content or reading contents. 

[Claim 5]The voice browser system according to claim 4 with which said reproduction means has halt 
mode, reproduction mode, and stop mode. 

[Claim 6]The voice browser system according to claim 4 to 5 with which said reproduction means 
includes an interruption means which performs an interruption output by a voice information during 
reproduction of said voice data. 



[Translation done.] 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention]The system of the client/server composition which this invention requires for 
a voice browser system, and consists of a computer and a network especially, To the server of the 
World-Wide-Web (it is only hereafter described as WWW) system in the Internet, it inputs with a 
sound from the microphone of a client terminal, and is especially related with the voice browser 
system which outputs with a sound the information accumulated in the server. 
[0002] 

[Description of the Prior Art]As a WWW system, as everyone knows A server and the hardware of a 
client, By using browsers, such as Netscape Navigator installed on the client terminal, when software 
is constituted appropriately on a network, It is possible to display the text stored in the server and 
the information on an image on a client screen, and to peruse them. 

[0003]If the specific information on a screen is chosen with a mouse etc. in this system, it is possible 
to access the information (below, this is referred to as that the link was stretched and previous 
specific information is called link items) related with this, to display on a screen, and to peruse. It is a 
premise that these services enjoy information by vision. 

Unless it turns its eyes to a screen, or it is unenjoyable, there is a fault that it cannot be given to a 
visually impaired person at all. 

As a method of solving this, it is possible by using the latest speech recognition technology and 
speech synthesis technique to input with a sound from a microphone and to output by voice 
synthesis. If it compares and the [Prime Minister s official residence] is inputted with ** and a sound, 
the information on the [Prime Minister's official residence] can be accessed, and a text part can be 
outputted by composite tone from the loudspeaker of a client terminal. 
[0004] 

[Problem to be solved by the invention]However, in the above-mentioned conventional method to 
well-known WWW information. The actual condition is that an animation and a link are stretched or 
the information of which it complains to vision is used in plenty not to mention a long text and the 
image information in color in which no less than ten pieces and no less than 20 pieces may have a 
link part everywhere being intermingled in a text. The problem how to output such information to a 
visually impaired person exists. 

[0005]This invention was made in view of the above-mentioned point, and an object of this invention 
is to provide the voice browser system which can acquire WWW information even if it is a visually 
impaired person. 
[0006] 

[Means for solving problem] Drawing 1 is a principle block diagram of this invention. In the 
client/server system which this invention (Claim 1) becomes from a computer and an internetwork, 
By the client 200 side, are the information accumulated in the server 100 a voice browser system 
outputted with a sound, and the server 100, The voice demand acquisition means 201 which acquires 
the demand by voice input from the client 200, The voice recognition means 102 which carries out 
speech recognition of the demand with the sound inputted from the voice demand acquisition means 
201, As opposed to URL specified from the client 200 based on the result recognized by the voice 
recognition means 102, The demand transmitting means 103 which transmits a demand to the 
Internet 70, and the voice data creating means 104 which extracts a reading-aloud text from the 
response acquired from the Internet 70, changes voice data, and synthesizes voice, Have the voice 
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data transmitting means 105 which transmits the voice data generated by the voice data creating 
means 104 to the client 200, and the client 200, From the voice input means 201 which inputs the 
demand with a user's sound, and the result acquired from the server 100, extract URL and based on 
this URL to this server. It has the demand issuing means 202 which requires the HTML file currently 
exhibited on the Internet 70, and the voice output means 203 which outputs the voice data 
transmitted from the server 100. 

[0007]This invention (Claim 2) contains a morphological-analysis means which conducts the 
morphological analysis of a link-items list which consists of a candidate similar to a demand of voice 
data inputted from the client 200, and the link-items list in the voice recognition means 102. This 
invention (Claim 3) includes a translating means which translates a response acquired from the 
Internet 70 into a language which a client wishes in the voice data creating means 104. 
[0008]In the voice output means 203, this invention (Claim 4) includes a reproduced reproduction 
means, when voice data received from the server 100 is a music content or reading contents. This 
invention (Claim 5) has halt mode, reproduction mode, and stop mode in a reproduction means. 
[0009]This invention (Claim 6) includes an interruption means which performs an interruption output 
by a voice information during reproduction of voice data in a reproduction means. As mentioned 
above, it is the system which made it possible to change into a voice information from vision 
information in this invention through a web browser of marketing of a file of HTML (Hyper Text 
Markup Languege) form currently exhibited on the Internet, and to provide for a user. When acquiring 
information by a client side, a visually impaired person's operation is also enabled by using a sound. 
[0010] 

[Mode for carrying out the invention] Drawing 2 shows the composition of the system by which this 
invention is applied. The system shown in the figure arranges each engine for processing on a high- 
speed network, and enables realization of a high-speed response with the client terminal 10 by 
performing load sharing. In the system in the figure, it divides roughly and is divided into two systems. 

[001 1]First, it is a treating part as a front end constituted from the workstations 20, 30, and 40 in the 
figure by the 1st. The workstations 20 and 30 are the systems for providing translation service. The 
workstation 40 is a function which the Internet may generally be sufficient as and is used, and is 
mainly used by the cash advance of data, or conversion of a kanji code in this system. 
[0012]They are the workstations 50 and 60 used [ 2nd ] as a back end. The workstation 50 performs 
a candidate list (list of link items), and comparative collation for the voice data inputted into the 
client terminal 10 from the user, and elects a suitable item. In order that the voice data and link items 
which the user inputted may cancel the completely same necessity, morphological-analysis 
processing is performed to link items. By doing so, suitable link items are guessed and elected also 
only by inputting the fragment of link items from a user. The workstation 60 receives the text 
information extracted with the client terminal 10 (with the parameter according to language sorts), 
and calls it the voice synthesis engine which generates voice data. 

[0013]Next, the HTML file demand transmitted from the client terminal 10 is transmitted to the 
external Internet 70 via a proxy server using the proxy server of the workstation 40 working on the 
workstation 20. The response data (HTML file) returned from the Internet 70, Via the proxy server of 
the workstation 40, processing of data is requested to each translation engine by the proxy server of 
the workstation 20, and it transmits to the client terminal (only a request, a processing result is 
transmitted to the client terminal 10 at any time by a user's demand) 10. The HTML file which 
reached the client terminal 10 is analyzed, transmits the text information which should be displayed 
to the voice synthesis engine of the workstation 60, is changed into voice data, and is outputted with 
the client terminal 10. Morphological-analysis processing is performed to the list of link items 
required for speech recognition with the morphological-analysis engine of the workstation 50, it is 
passed to speech recognition engine, and waits for transmission of voice data from the client terminal 
10. 

[0014]Next, the voice data inputted with the client terminal 10 (directions) is transmitted to the 
speech recognition engine 50, comparative collation of the speech recognition engine 50 is carried 
out to the link items (text information) inputted before, and it obtains a suitable result. The obtained 
result is transmitted to the client terminal 10. Thereby, the client terminal 10 extracts URL from link 
items, and acquires the following information via the proxy servers 20 and 40 to the Internet 70. 
[0015] 

[Working example]Below, the embodiment of this invention is described with Drawings. Drawing 3 
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shows the system configuration of one embodiment of this invention. In the system shown in the 
figure, identical codes are attached about the same thing as drawing 2 . The system shown in the 
figure comprises the client terminal 10 and the workstations 20, 30, 40, and 50. 
[0016]The client terminal 10, It comprises the loudspeaker 16 and the microphone 17 which are 
connected to the web browser 11, the button monitoring program 12 for voice input, the voice 
browser client 13, the button 15 for a voice input start connected to the button monitoring program 
12 for voice input, and the voice browser client 13. The workstation 20 has the English-Japanese 
translation engine 22 which the proxy server 21 functions and translates a text into Japanese from 
English. 

[0017]The workstation 30 has the Japanese-English translation engine 31 which translates a text into 
English from Japanese. The workstation 40 has a function of the proxy server 41. The workstation 50 
has the speech recognition I/F program 51, the speech recognition engine 52, and the morphological- 
analysis engine 53. 

[0018]The workstation 60 has the voice synthesis engine 61. Generally the web browser 11 is used 
well and has Netscape Navigator etc., for example. This example explains using the browser 
concerned. The web browser 11 concerned acquires required information as a window to the Internet 
70, and passes it to the voice browser client 13. The display of a up to [ a web browser ] is also 
performed. In the voice browser client 13, the information acquired from the web browser 11 is 
analyzed, The information on the link items which should carry out speech recognition of the text 
information which should be read aloud to the voice synthesis engine 61 of the workstation 60 is 
transmitted to the speech recognition I/F program 51 of the workstation 50, respectively. 
[0019]The client terminal 10 is played by the loudspeaker 16, recording the voice data received from 
the voice synthesis engine 61 on a local disk. The input from a user is notified to the voice browser 
client 13 from the button monitoring program 12 for voice input by carrying out the depression of the 
button 15 for a voice input start. The voice browser client 13 which received the notice starts audio 
sound recording from the microphone 17. By release of a users button 15 for a voice input start, the 
voice browser client 13 suspends sound recording, and transmits the voice data recorded to the 
speech recognition I/F program 51 to the voice synthesis engine 61 of the workstation 60. 
[0020]The proxy server 21 of the connection **** workstation 20 transmits the information (HTML 
file) which should be transmitted from the Internet 70 to each translation engine (the English- 
Japanese translation engine 22, the Japanese-English translation engine 31), and makes translating 
processing perform to the client terminal 10. About a translation result, when it records on the 
memory storage of each translation engines 22 and 31 and a translation request occurs from a user, 
the translation result is transmitted to the client terminal 10. 

[0021]In the PUKISHI server 41 of the workstation 40, it has functions, such as changing a part of 
information from the Internet 70 (kanji code etc.), or carrying out cash of the information temporarily. 
The speech recognition I/F program 51 of the workstation 50 inputs into the morphological-analysis 
engine 53 the link items transmitted from the voice browser client 13, and performs suitable 
reconstruction processing for origin for the part of speech of the decomposed character string which 
is the result of being outputted. The result is registered to the speech recognition engine 52, the 
recorded voice data which is transmitted from the voice browser client 13 is passed to the speech 
recognition engine 52, and the result compared in the speech recognition engine 52 concerned is 
returned to the voice browser client 13. 

[0022]The voice synthesis engine 61 of the workstation 60 receives the text information which was 
extracted by the voice browser client 13 and which should be read aloud, generates voice data, and 
returns it to the voice browser client 13. Drawin g 4 is a sequence chart of operation of one 
embodiment of this invention. First, if a user does the depression of the button 15 for a voice input 
start (Step 101), the notice will be transmitted to the voice browser client 13 from the button 
monitoring program 12 for voice input. The voice browser client 13 starts voice recording from the 
microphone 17, and waits for an end. If a button is released by a user (Step 102), the notice of an end 
will be again reported to the voice browser client 13 from the button monitoring program 12 for voice 
input. The voice browser client 13 which received the report suspends sound recording, and transmits 
the recorded voice data to the speech recognition I/F program 51 of the workstation 50 (Step 103). 
[0023]The speech recognition I/F program 51 of the workstation 50 receives the data concerned, 
transmits the data to the speech recognition engine 52 (Step 104), and requests recognition 
processing. The voice popular I/F program's 51 acquisition of the result by which recognition 
processing was carried out with the speech recognition engine 52 concerned will transmit the 
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recognition result concerned to the voice browser client 13 (Step 106). (Step 105) The voice browser 
client 13 obtains URL from the link items of a result, and publishes demand transmission instruction 
to the web browser 11 (Step 107). 

[0024]The web browser 1 1 transmits a data request message to specified URL via the proxy servers 
21 and 41 at the Internet 70 (Step 108). If the proxy server 21 acquires the response corresponding 
to the demand concerned from the Internet 70 via the proxy server 41 (Step 109), the response 
concerned will be transmitted to the English-Japanese translation engine 22 or the Japanese-English 
translation engine 31 (Step 110). 

[0025]Japanese~English translation either the English-Japanese translation engine 22 or an engine 31 
performs processing directed by the proxy server 21, and it returns the result to the proxy server 21. 
Thereby, the proxy server 21 transmits response data to the web browser 11 of the client terminal 10 
(Step 111). If the web browser 1 1 of the client terminal 10 receives response data, the voice browser 
client 13 will be passed (Step 112). When a page is multiframe at this time, the above-mentioned 
processing is repeated by a composition view from the web browser 11. The acquired response data 
is analyzed in the voice browser client 13, The text information etc. which are displayed as link items 
are acquired, and each is transmitted to the speech recognition I/F program 51 of the voice 
synthesis engine 61 of the workstation 60, and the workstation 50 (Step 113). HTML analysis is 
conducted in that case, when there is text information, such as an explanatory note added to image 
(picture) information, the information is also processed appropriately, and is transmitted to the voice 
synthesis engine 61, and the details of an image are told to a user by reading aloud from the 
loudspeaker 16. When transmitting to the voice synthesis engine 61, it makes it possible to provide 
suitable services (rewinding/rapid traverse of an one-sentence unit) for a user by transmitting 
according to language per sentence. The recognition candidate list information, including link items 
etc., that it changes dynamically etc. are transmitted to the speech recognition I/F program 51 (Step 
1 14). Transmission shall not be performed about a fixed command each time. 

[0026]In the speech recognition I/F program 51, morphological-analysis processing is once performed 
with the morphological-analysis engine 53 about the received link items (Steps 115 and 116), it is 
registered with a fixed command to the speech recognition engine 52 (Step 117), and the voice data 
from a user is stood by. The registration concerned is needed in order to use for comparative 
collation with the voice data transmitted by the user. 

[0027]The voice synthesis engine 60 compounds the sound from a user, and transmits the 
compounded voice data to the voice browser client 13 (Step 1 18). Next, it explains using a concrete 
example. Dra wing 5 is a figure showing the user interface of one embodiment of this invention. 
Dr awing 6 is a figure showing the example aloud read by the voice browser of one embodiment of this 
invention. 

[0028]The figure is a user interface of the web browser 11. The title 110, the link items 130, and the 
text 120 are displayed on a page by text information. These information is acquired from a file written 
by grammar according to HTML, analyzes by the voice browser client 13 side, and classifies the file 
concerned into a title, link items, and an item of the text. And information outputted is aloud read, as 
shown in dra wing 5 . By adding suitable guidance, a user is provided with information in detail. 
[0029]It can access in an input of only an impressive word, without inputting a link-items whole 
sentence into the voice browser client 13, in directing link items at the time of an input. About the 
structure, first, a link-items whole sentence is inputted into the morphological-analysis engine 53, 
and combination is again performed from a result by which part-of-speech decomposition was carried 
out. By reconstructing a compound etc. from a word which is the minimum element, an input (from a 
word up to a compound) from a user can be coped with. 

[0030] D ra wing 7 i s a mold of the homepage currently shown by the multi frame configuration of the 
web browser of one embodiment of this invention. 

Drawing 8 is an example aloud read by the voice browser of one embodiment of this invention. 
In this case, in the voice browser client 13, it acquires having two or more views etc. by analyzing an 
HTML file, and they are told to a user with a sound. Reading aloud shall be performed per view. 
[003 1 ] Prawjng 9 is the example in which the image was published by the web browser of one 
embodiment of this invention. 

Drawing 10 is an example aloud read by the voice browser of one embodiment of this invention. 
Since the imaged figure shown in drawing 9 is vision information, it is difficult to give the information 
concerned with a sound. However, by analyzing the tag information of HTML, by adding explanation to 
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an image, the explanation corresponding to an image is extracted by the voice browser client 13 side, 
and it makes it possible to read out with a sound. In this case, a HTML maker does, on condition that 
the explanatory note about that imaged figure is added by text information. 

[0032]Next, a reproduction control function is explained. This function is a function in which sex of 
read-aloud speed of reading aloud, volume, and a speaker, etc. can be changed into real time with 
audio directions. It changed possible in real time by performing re-creation of voice data by changing 
a parameter of the voice synthesis engine 61 about sex of read-aloud speed and a speaker, and 
performing re-creation preferentially from a reproduction point. About volume, it shall be coped with 
by changing a parameter of a system. 

[0033]It is possible to add interruption of specific services (notice of current time, etc.), etc. as an 
option of a re-control facility. In detail, if current time is asked during reading aloud, time will be 
asked to a system, creation of voice data will be once required of the voice synthesis engine 61, 
reading aloud will be interrupted with the completion of creation temporarily, and current time will be 
announced. It is realizable by resuming reading aloud under discontinuation after that. 
[0034]Reproduction control of voice data currently embedded into an HTML file is also possible. 
Drawing 1 1 is a block diagram in a case of performing music / reading contents playback of one 
embodiment of this invention. When this voice data is a thing of the contents, such as music and 
reading, and it plays, the web browser 11 starts the refreshable software 14 for playback 
automatically. When the voice browser client 13 controls the software 14 for reproduction, it 
becomes possible to control by a sound the conventional function which the software concerned, 
such as a halt and reproduction, has. 

[0035] Drawing 12 is a sequence chart of reproduction control of voice data of one embodiment of 
this invention. First, if directions of URL are published from the voice browser client 13 to the web 
browser 11 (Step 201), the web browser 11 will transmit the directions concerned to the Internet 70. 
Thereby, in the web browser 11, HTML is acquired from the Internet 70, it transmits to the voice 
browser client 13, and the HTML concerned is analyzed in the voice browser client 13 concerned. 
The web browser 11 requires voice data embedded at HTML of the Internet 70 (Step 203), acquires a 
response corresponding to the demand concerned acquired from the Internet 70, and starts and 
transmits the software 14 for reproduction (Step 204). If the voice browser client 13 publishes a 
speech recognition demand to the speech recognition engine 53 based on an analyzed result (Step 
205) and a speech recognition result is acquired, it will transmit the result concerned to the voice 
browser client 13 (Step 206). Thereby, the voice browser client 13 controls the software 14 for 
reproduction, and reproduces a sound. 

[0036]Change and application are variously possible for this invention within Claims, without being 

limited to the above-mentioned embodiment. 

[0037] 

[Effect of the Invention]As mentioned above, according to this invention, the WWW information of the 
Internet is accessed by voice input, and it becomes possible to output with a sound, the case where 
the accessed information is English — a translation function — since an output, in Japanese is 
possible, the contents of information can be grasped also by a user with little English knowledge. 
[0038]Since [ not only to a text sentence but music information and reading information ] it is 
accessible, it can be used also as an object for amusement. In the case of reproduction of music 
information or reading information, like the usual radio cassette recorder (radio cassette recorder 
which has an output, reproduction, and a recording function), since reproduction control, such as a 
halt, reproduction, and a stop, is possible, there is little a user's resistance. 

[0039]Since it is possible to interrupt during text reading aloud etc. and to hear a time signal, even 
when it does not have for visually impaired persons or a clock with it, a time signal can be known with 
a sound. It becomes supportable to a visually impaired person's Internet usage with the above 
function. 



[Translation done.] 
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[Brief Description of the Drawings] 

[Drawing 1] It is a principle block diagram of this invention. 

[Drawin g 2]It is a system configuration figure where this invention is applied. 

[ Dra wing 3] It is a system configuration figure of one embodiment of this invention. 

[Drawing 4] It is a sequence chart of operation of one embodiment of this invention. 

[Drawing 5] It is a figure showing the usual user interface of the web browser of one embodiment of 

this invention. 

[Drawing 6] It is an example aloud read by the voice browser of one embodiment of this invention. 
[Drawing 7] It is an example of the mold of the homepage currently shown by the multi frame 
configuration of the web browser of one embodiment of this invention. 

[Draw in g 8] It is an example aloud read by the voice browser of one embodiment of this invention. 
[Drawing 9] It is the example in which the image was published by the web browser of one 
embodiment of this invention. 

[Drawing 10] It is an example aloud read by the voice browser of one embodiment of this invention. 
[Drawing 1 1] lt is a block diagram in the case of performing music / reading contents playback of one 
embodiment of this invention. 

[Dr awin g 1 2] It is a sequence chart of the reproduction control of the voice data of one embodiment 
of this invention. 

[Explanations of letters or numerals] 

10 Client terminal 

1 1 Web browser 

12 A program for button surveillance for voice input 

13 Voice browser client 

14 Software for reproduction 

20, 30, 40, 50, and 60 Workstation 

21 Proxy server 

22 English-Japanese translation engine 
31 Japanese-English translation engine 
41 Proxy server 

51 Speech recognition I/F program 

52 Speech recognition engine 

53 Morphological-analysis engine 
61 Voice synthesis engine 

70 Internet 

1 00 Server 

101 Voice request means 

102 Voice recognition means 

103 Demand transmitting means 

104 Voice data creating means 

105 Voice data transmitting means 
110 Title 

1 20 Text 

130 Link items 

200 Client 
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201 Voice input means 

202 Demand issuing means 

203 Voice output means 
210 The first frame 
220 The second frame 
310 Imaged figure 
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